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In suitable environments, proteins, nucleic acids and certain synthetic polymers fold into unique 
conformations. This work shows that it is possible to construct lattice models of foldable het- 
eropolymers by expressing the energy only in terms of individual properties of monomers, such as 
the exposure to the solvent and the steric factor. 
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It is generally believed that the hydrophobic inter- 
action plays a major role in protein folding PHlO|. 
Under physiological conditions, non polar amino 
acids are buried inside the core of the native state 
of a protein to avoid contact with water molecules. 
A long standing question is to what extent other 
non-covalent forces, such as hydrogen bonding, elec- 
trostatic and van der Waals interactions contribute 
to stabilize the folded state jl|-[§]. 

Unraveling the different roles played by these in- 
teractions will have a considerable impact in differ- 
ent areas of research in biophysics, such as the pre- 
diction of protein structures jl M[|j^] the design of 
synthetic drugs ||,[ll]-|l3| , and the production of self- 
assembling non-biological polymers J14[ and other 
polymeric materials Jig ]. 

With the advent of genome projects Jl6| a wide 
gap is opening between the number of known pro- 
tein sequences and their correspondent structures 
JT^ ]. The bottleneck in protein structure prediction 
is at present largely due to the incorrect treatment 
of the interactions pTq— |2C|] ■ The current state of the 
art is highlighted by a recent study by Fisher and 
Eisenberg ^lj. They carried out the assignment of 
structures to the sequences encoded in the complete 
genome of Mycoplasma genitalium. Among the com- 
plete set of 468 sequences, they were able to assign 
103 (22%) of them to a structure with high confi- 
dence. They used homology modelling and thread- 
ing techniques that are at present the most success- 
ful prediction tools available. Homology modelling 
is based on the observation that the exact identity of 
amino acids is not crucial for maintaining the overall 
fold of a protein |22| . Proteins that differ as much as 
70% in their sequences usually share the same fold. 
Thus, it is possible to predict the conformation of a 
sequence by using a set of experimentally determined 
protein structures with similar amino acid compo- 
sition. Threading relies on the surprising fact that 
more distantly related proteins, whose sequence sim- 



ilarity is close to the threshold of pure randomness, 
do sometimes share the same fold [@. The search 
for compatibility between sequence and structure has 
inspired various techniques to single out the native 
state of a protein from a library of alternative struc- 
tures |^|,^2|,^3) . The screening is typically carried out 
by assigning an energy-like function that incorpo- 
rates the compatibility of each amino acid to its lo- 
cal environment. Compatibility is described in terms 
of charge, polarity and secondary structure content, 
within a given conformation. Details in the local 
environment play a crucial role also in RNA fold- 
ing. A key ingredient in this case is given by metal 
ion coordination numbers Likewise and rather 
surprisingly, a non-biological polymer (an aromatic 
hydrocarbon) has been recently designed which is 
able to fold into a unique helical structure having a 
large cavity, supposedly under the effect of the hy- 
drophobic interaction |14j| . 

This letter contributes to the development of a 
rational treatment of the hydrophobic interaction at 
the single monomer level. We show that it is possible 
to construct minimalistic lattice models of foldable 
heteropolymers, by introducing an energy function 
that depends only on individual residues' environ- 
ments. A model will be called "foldable" if there are 
sequences, either randomly chosen or selected, with 
a unique, thermodynamically stable and kinetically 
reachable ground state @,|,| |||],||| §7| . We adopt 
a simple approximation for the energy which ac- 
counts both for the propensity to be exposed to the 
solvent and for the excluded volume effects due to 
the different sizes of the monomers. Although natu- 
rally existing or synthesized polymers, such as pro- 
teins, nucleic acids and tailored hydrocarbons, are 
characterized by much more complex interactions, 
the main focus here is on the fact that the unify- 
ing feature is the tendency to avoid contact with the 
solvent by some species of monomers. Previous theo- 
retical studies concentrated mainly on the treatment 
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of pairwise contact energies [p| Jl^j20| , |28| -|32| . This is 
in contrast with the present study, in which the hy- 
drophobic effect is investigated at the individual par- 
ticle level. 

Lattice models, although often criticized (3^], have 
been recognized to capture some of the most rele- 
vant thermodynamic features of the folding process 
p4{ , such as the existence of a unique ground state, 
amenable to exact computations, and the coopera- 
tivity of the transition. Even key dynamical pro- 
cesses, such as the nucleation-condensation mecha- 
nism [ p5| , h ave been validated with the help of lattice 
models piql . 

Results obtained on a 2D square lattice are pre- 
sented first. On a lattice, a polymer is represented as 
a connected chain of TV monomers. Hydrophobicity 
and steric factors can be modeled as the tendency of 
a monomer to have a specific number of non-bonded 
nearest- neighbors. We define the hydrophobic model 
HMi by expressing the energy E\ as, 
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where n(<Zj) is the number of non-bonded nearest- 
neighbors of monomer of species a; in position i 
along the chain, and n(cii) is the ideal value of n(a,i). 
This expression was first proposed by Hao and Schcr- 
aga , who supplemented it to a pairwise energy term 
p7fl . They presented a method to optimize energy 
parameters to obtain lattice models of foldable poly- 
mers. Other previous work has been devoted to the 
hydrophobic interaction, although without specifi- 
cally disentangling it from other interactions. Mirny 
and Domany 1 32 1 introduced explicitly an hydropho- 
bic term in the energy function and they performed 
various tests of fold recognition and dynamics. In 
a recent work Li, Tang and Wingreen |38| discussed 
the "designability principle" || in terms of a "bi- 
nary" model with two species of amino acids, where 
the energy is expressed in terms of the exposure 
to the solvent only. The model proposed here is 
much more general and no major modification is re- 
quired to extend it to the treatment of realistic mod- 
els of foldable heteropolymers, as for example, in a 
"contact map" representation of protein structure 
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We will first investigate if in 2D the HMi model 
gives rise to foldable sequences and we will compare 
our results with those obtained using the standard 
HP model g. In 2D, with the aid of the HP model 
[|],|]J!J_1], ^ has been demonstrated that it suffices 
to assume only 2 species of monomers to guarantee 



uniqueness of the ground state, although probably 
not the right order of the folding transition |t| . In 
the HP model the energy is written in the pairwise 
contact approximation 



Epair — ^ U(aj,aj)A. 
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where a* can be either H (hydrophobic) or P (polar) 
and Aij is a contact matrix, which is defined to be 
1 if two monomers are non-bonded nearest-neighbor 
and otherwise. The typical values for the interac- 
tion parameters are U(H,H) = — 1 and U(H,P) = 
U(P,P) = |. A chain of N = 16 monomers 
is amenable to complete enumeration of all 802075 
possible symmetry-unrelated conformations, either 
compact or not 12 13). For the above mentioned 
choice of contact energy parameters, there are 1539 
(2%) sequences among the 2 16 = 65536 possible ones 
which have a unique ground state |]l2|,^3). We com- 
pare this result with those obtained by using Eq.(Q), 
setting 71(1) = 1.5 (hydrophobic-like) andn(2) = 0.4 
(polar-like). A larger number, 10178 (16%), of se- 
quences was found to have a unique ground state. 

We have also explored the case of three species of 
monomers, a number which, within a contact ap- 
proximation of the interactions (as in Eq.(Q)), is 
believed to epitomize the essential features of the 
interplay between folding and glass transitions in 
random heteropolymers [Q. We chose at random 
20177 sequences among the 2018016 possible ones 
with fixed composition 7V(1) = 6, 7V(2) = 7V(3) = 5, 
where N(a) is the number of monomers of species a. 
Choosing energy parameters n(l) = 0.4, n(2) = 1.1 
and n(3) = 1.8, 9439 (47%) sequences were found 
with a unique ground state. 

In the spirit of Mirny and Domany ]32] ], a more 
realistic form for the hydrophobic energy is given by 
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This expression will be referred to as hydropho- 
bic model HM2. The parameters f3(ai) capture the 
various degrees with which the different species of 
monomers tend to attain the preferred number n(cii) 
of contacts. In the case of 2 species of monomers, we 
repeated the same calculation as for the HM X model. 
Letting here 0(1) = (3(2) = 1, we found 9821 (14%) 
sequences with a unique ground state. We observe 
that, at least from the above calculations in 2D, the 
approximation of the hydrophobic interaction pro- 
posed in this work is capable of yielding foldable 
sequences. 
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We now turn to the calculations in 3D which rep- 
resent the essential part of this work, and further 
illustrate the extent to which the present model em- 
bodies foldability. It is known that the HP model 
is pathological in 3D, since it is rather uncommon 
to have a sequence with a unique ground state with 
a large gap above it although the situation 

can be different with a choice of parameters favor- 
ing more collapsed structures [p|,po|. We discuss 
here the general case of 20 species of monomers in 
the HM2 model. We compare these results with 
those obtained by using a common parametrization 
of the pairwise contact interaction matrix U(aj, a,) 
in Eq.(Q), due to Miyazawa and Jernigan (MJ) [ 28 
although other choices would be possible [ p8[ - 
For the HM2 model, we derived the 40 parameters 
77(a) and /3(a) from a statistical analysis of the non 
redundant set of 246 protein structures reported by 
Hinds and Levitt The procedure, similar to 

that of Mirny and Domany |32| , is straightforward. 
For each amino acid species a, we computed the av- 
erage, 71(a), and the standard deviation, 13(a), of the 
number of contacts it forms in the set of experimen- 
tally known crystal structures (see Table j|). Two 
amino acids are said to be in contact if their C a 
atoms are closer than 8.5 A in the native structure 

it 

On the cubic lattice, the 103346 symmetry- 
unrelated maximally compact conformations of a 
polymer of length N = 27 can be enumerated in 
a manageable computer time |^,[l3|]. If it is guaran- 
teed that the ground state is maximally compact, 
exact enumeration can be used to demonstrate its 
uniqueness. We adapted the energy parameters to 
the cubic lattice by matching the average number 
of contacts that a monomer forms on the 3x3x3 
cube with the average of the ideal number of con- 
tacts, (1/20) ^ Q=1 20 77(a). This result is obtained 
by rescaling the energy parameters in Table | by a 
factor 3.315. 

To characterize foldability, we first investigate the 
thermodynamic stability of the ground states of ran- 
dom sequences. A typical measure of thermody- 
namic stability is given by the Z score @,[5l) , which 
is defined by Z — (E n — (E))/a, where E n is the en- 
ergy of the ground state, (E) is the average energy, 
and a the standard deviation in the distribution of 
the energy around the average. We measure the dis- 
tribution of the Z scores for a set of 1000 random 
HM 2 sequences. We found that only 2% of them 
had a unique lowest energy state (the "ground state" 
among maximally compact conformations). More- 
over, on average the degeneracy was 22. For com- 



parison, 99% of the 1000 random MJ sequences that 
we considered had a non degenerate lowest energy 
state, and the remaining ones had a very small de- 
generacy. The result of the comparison of the Z 
scores is shown in Fig. ED. 
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FIG. 1. Normalized histograms of the Z scores for the 
HM2 (full lines) and MJ (dotted lines) models. Circles 
refer to random sequences and squares to designed ones. 

The low values of the Z score and the large de- 
generacy found for the HM2 case mark a shortcom- 
ing of enumerating only maximally compact con- 
formations. By using other simulation techniques, 
such as the standard lattice Monte Carlo (SMC) 
^||,^2), and the prune-enriched Rosenbluth method 
(PERM) 1 44 1, we easily found non-compact lower 
energy states for most of the considered HM2 se- 
quences. It is known, however, that foldability is a 
property of selected sequences |||-0,|o|-|3|2|j2^1 . A 
way to demonstrate that the HM 2 model is foldable 
is to show that it possible to select sequences whose 
ground states are both unique and maximally com- 
pact. The usual design procedure [[TTJ, introduced 
to study pairwise interactions, prescribes to choose 
a target conformation and then to search in sequence 
space for the sequence with minimal energy onto 
such conformations. This procedure delivers a bet- 
ter Z score for the 1000 designed MJ sequences that 
we considered, as can be seen from Fig. |lj. However, 
in the case of the HM2 model, we found that such 
technique is not sufficiently effective in designing out 
alternative conformations. A sequence design proce- 
dure similar to those proposed in Refs. fl^,|l3| proved 
to be more effective. Sequences selected in this way 
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were found to have a unique ground state by exact 
enumeration among maximally compact conforma- 
tions. More crucially, in no cases we have been able 
to reach lower energy states using dynamical simu- 
lation techniques such as the SMC and the PERM 
algorithms. The histogram of the Z score of the 100 
HM2 sequences selected in this way is shown in Fig. 

& 

In summary, we have shown both in 2D and in 3D 
that it is possible to construct simple models of fold- 
able heteropolymers by expressing the hydrophobic 
and the steric interactions at the level of individual 
monomers. 

It is a pleasure to thank E. Domany and P. Grass- 
berger for discussions. 
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ALA 


GLU 


GLN 


ASP 


ASN 


LEU 


GLY 


LYS 


SER 


VAL 


ARG 


THR 


PRO 


ILE 


MET 


PHE 


TYR 


CYS 


TRP 


HIS 


7.56 


5.62 


6.23 


5.51 


6.02 


7.63 


5.55 


5.86 


6.31 


8.29 


6.58 


6.73 


5.73 


8.07 


7.72 


7.58 


7.45 


8.81 


7.67 


6.59 


2.98 


2.17 


2.36 


2.41 


2.58 


2.25 


3.51 


2.08 


3.07 


2.53 


2.33 


2.77 


2.85 


2.35 


2.37 


2.31 


2.51 


2.42 


1.36 


2.43 



TABLE I. Mean n(a) and standard deviation /3(a) of the number of contacts of amino acids, obtained from a statistical 
analysis of a non redundant set of 246 protein structures. 
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